A Quantum Approach to Vision Language Modelling

Mehrnoosh Sadrzadeh (University College London)

Fri May 22, 15:30-16:45 (2 weeks ago)

Abstract: Vision-language models excel at large-scale image-text alignment but often neglect the compositional structure of language, leading to failures on tasks that hinge on word order and predicate-argument structure. We show how techniques from tensor networks and variational quantum circuits help us solve the problem. To this end, we introduce two tools DisCoCLIP and QuCLIP, multimodal encoders that combine a frozen CLIP vision transformer with a tensor network text encoder that explicitly encodes syntactic structure. We also work with translations of syntax into variational quantum circuits. We train both models with a self-supervised contrastive loss and show how the models improve on compositional benchmarks such as SVO-Probes and ARO, while using a significantly smaller number of parameters. The parameter reduction is a known feature of tensor networks and variational quantum circuits, and for this case, was on average from hundreds of millions to tens of thousand.

Speaker's bio: Mehrnoosh is a Professor of CS, leads UCL CS's Quantum Learning Labs, and is the CS Director of Research. Her research is supported by a Royal Academy of Engineering (RAEng) Research Chair, jointly with the BBC and Quantinuum Ltd. Mehrnoosh’s UG and MSc studies were in Sharif University in Iran. Her PhD in University of Quebec at Montreal. Previously, Mehrnoosh had two RAEng Industrial Fellowships, in QMUL and UCL, an EPSRC Career Acceleration Fellowship in Oxford, an EPSRS PDRF and a Wolfson College Junior Research Fellowship, also at Oxford.

Moderator's bio: Ted Theodosopoulos is a mathematician who, after working for years in academia and industry, transitioned to teaching at the pre-college level sixteen years ago, the last eight at Nueva, where he teaches math and economics. Ted’s research background is in the area of interacting stochastic systems, with particular applications in biology and economics.

Computer scienceMathematics

Audience: researchers in the topic

Relatorium seminar

Series comments: The name "Relatorium" combines "relator" with the Latin root "-ium," meaning "a place for activities" (as in "auditorium" or "gymnasium"). This seminar series is a platform to relate ideas, interact with math, and connect with each other.

In this series, we explore math beyond what we usually hear in standard talks. These sessions fall somewhere between a technical talk and a podcast: moderately formal, yet conversational. The philosophy behind the series is that math is best learned by active participation rather than passive listening. Our aim is to “engage and involve,” inviting everyone to think actively with the speaker. The concepts are accessible, exploratory, and intended to spark questions and discussions.

The idea of relatability has strong ties to compassion — creating space for shared understanding and exploration - which is the spirit of this seminar! This is a pilot project, so we’re here to improvise, learn, and evolve as we go!

Organizers:	Priyaa Varshinee, Tim Hosgood, Niels Voorneveld, Irfan Alam
	*contact for this listing

Export talk to